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It is argued that the two problems of choosing characterizations and models of complex systems 
should not be considered independently. A particular criterion for these choices, oriented on the 
potential usefulness of the results, is considered, and a generic formalization applicable to realistic 
experiments is developed. It is applied to Kuramoto-Sivashinsky chaos. 



PACS numbers: 05.65.+b, 05.45.-a, 07.05.Tp, 01.70.+Bw 



The systematic characterization of self-organized 
structures is a long-standing challange to the science 
of structure formation. 'Labyrinths', 'breathers', 'den- 
drites', 'worms', 'spiral-defect chaos', or 'scale-free net- 
works' are only few of the words that were introduced to 
describe real and numerical experimental observations. 
Images and natural language can usually communicate 
what is ment, but as the number of observed structures 
increases and distinctions become finer, a more system- 
atic approach seems desirable. The problem is felt par- 
ticularly strong for the large variety of spatially irregular 
structures and spatio-temporally chaotic states that have 
been found 0]. 

In search for appropriate characterizations researchers 
do often concentrate on those properties of the experi- 
mental data that are easily modeled - those properties 
of the data or the underlying structures that are gov- 
erned by their own rules (one might call them "coherent 
structures"). In this case, the choice of the character- 
ization depends on the available models. On the other 
hand, only when a particular set of properties of exper- 
imental data has been found to be characteristic for an 
observed structure, one can meaningfully ask for a model 
that reproduces this structure, i.e., a model that repro- 
duces data with these properties. Modeling requires prior 
characterization. 

Intuition is the fallback most researchers rely on when 
facing this circular relationship of modeling and charac- 
terizing. In fact, intuition is an excellent guide. But for 
some problem areas, e.g., those involving spatio-temporal 
chaos, progress appears to have slowed down also due to a 
lack of intuition about what the characteristic properties 
and what appropriate models are. Even when intuition 
is successful in choosing models and characterizations, it 
is legitimate to ask if these choices are subjective in the 
sense that they depend essentially on the way humans 
observe the world (other beings might decide very differ- 
ently), or if they are the solution of some objective prob- 
lem, that our intuition is just highly efficient in solving. 
Most of the approaches to the related problem of emer- 
gence (e.g., I^yl) are based on the a priori assumption 
of some limitation to observation (coarse graining) , thus 
involving an "inherently subjective" component. For 
an argument in favor of the objectivity of the choices it 



is therefore important to formulate a criterion that does 
not depend on such artificial limitations. 

Here, a proposal for such a criterion is introduced. 
It is first stated on a heuristic level and then modeled 
in a mathematical language; thus modeling the prob- 
lem of modeling. This involves the combination of con- 
cepts from computer science that proved powerful in the 
context of structure formation - algorithmic complex- 
ity (program length) 13, |^ and computational complex- 
ity (execution time) [3 - with ideas from statistical test 
theory 0] . It is shown that the circular relation between 
models and characterizations is, in this case, not vicious: 
the criterion leads to nontrivial choices. As an example, 
the formalism is applied to the spatio-temporally chaotic 
solutions of the Kuramoto-Sivashinsky Equation. 

Consider the following requirements for models and 
characterizations: 

Characterizations should be easily communi- 
cated and verified, be specific, and should, 
over a wide control-parameter range, apply (1) 
to experimental data and be reproducible in 
models. 

Models should be easily communicated and 
easily evaluated, show little artifacts, and re- (2) 
produce given characterizations. 

The practical relevance of most of these requirements is 
obvious. To see why it is desirable that characteriza- 
tions are reproducible in models, notice that, from such 
models, larger, composed models could be constructed, 
that can then be used to explore and characterize situa- 
tions not accessible experimentally (e.g., climate models). 
Even though the existence of models of sub-systems that 
reproduce the properties relevant for the composed model 
is not guaranteed, in case that they exist, it is good to 
know them. Now, as the general criterion, choose those 
pairs of models and characterizations that jointly statisfy 
conditions Q and (0) as well as possible. 

In order to formalize this criterion and make it acces- 
sible to a rigorous analysis, both characterizations and 
models are represented by computer programs: programs 
that test data for particular properties, and programs 
that generate data. The practical use of these programs 
is illustrated in Fig. 
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FIG. 1: (a) Generic setup of a computer-controlled experi- 
ment, (b) Data flow in a test of a computational model. 



Figure ^ shows a generic setup for a computer- 
controlled experiment. The experimenter enters some 
control parameter values at a console. The set of control 
parameter values is encoded in a binary string a; S C, 
where the control parameter format C is a subset of the 
set {0,1}" of all binary strings of length n for some 
n G Nq. Based on these control parameter values, the 
control parameters of the experiment are adjusted, usu- 
ally via some D/A conversion. 

When the experiment is run, experimental data is 
recorded in binary form. Data is encoded in a binary 
string y G D, where the data format D is a subset of 
the set {0, 1}™ of all binary strings of length m for some 
m € No- This could, for example, be image data or a time 
series. Generally, y is a realization of a random variable 
Y with values in D. The experiment is assumed repro- 
ducible in the sense that repeated runs of the experiment 
yield a sequence Yi,l2, . . . of statistically independent, 
identically distributed (i.i.d.) results. 

A characterization is represented by a program t that 
computes a statistical test on experimental data: A 
test takes a control parameter a; G C as input, runs, 
and then either halts with output e or requests a fi- 
nite number of (re)runs of the experiment and then halts 
with an output of 1 or 0. By the output e the tests 
t indicates that x is not within its range of validity 
C[t] := {x G Cjoutput of t with input x is not e} [lOj. 
The outputs 1 or indicate that the null hypothesis (see 
below) is accepted or rejected by the test, respectively. 
When the test requests an experimental rerun, its ex- 
ecution is suspended until the experimental result y is 
written into a dedicated storage accessible by the test. 

A model is represented by a computer program g that 
generates data to be used in place of experimental data 
(Fig. A generator takes a control parameter x G C 
as input, runs, outputs data y d D and halts. In order 



to produce random results, the program has access to a 
source of independent, evenly distributed random bits. 
Subsequent runs of a generator are fully independent. 

As in conventional statistical test theory , the power 
function is introduced. Denote by t^dyi}) the output 
of the test t at control parameter x G C[t] when applied 
to the sequence of experimental results {yi} G (for 
formal simplicity, the sequences {yi} are assumed infinite, 
even though the tests use only finite subsequences). Let 
{Yi} be a sequence of i.i.d. random results with values 
in D°°. Define the power of the test function tx when 
applied to {Yi} as the probabihty to reject {Yi}, i.e., 

pow(t„ {Y,}) := FrM{Y,}) - 0] [x £ C[t]). (3) 

Unlike in conventional test theory, there is no indepen- 
dent null hypothesis Hq here that states the distribution 
or the class of distributions of {Yi} that is tested for. In- 
stead, given a test function tj., the null hypothesis, i.e., 
the class of distributions, is defined by the condition 

pow{t,,{Y,}) <a, (4) 

where < a < 1 is a fixed significance level 

The ease or difficulty of communicating a test t or 
model g, mentioned in requirements H1I2|I . is measured 
by the lengths L{t), L{g) of the programs t and g. The 
value of L(-) depends on the machine model. In the ex- 
ample below, MMIX, an idealized modern microproces- 
sor is used '■S\. 

The ease or difficulty of verifying characterizations and 
evaluating models is measured by the execution times 
T{g), T{t) of the programs. To be specific, define T(-) 
as the maximum of the expectation value of the run- 
time over all a; G C and all distributions of data. Below, 
time is measured by the number of "oops" (symbol: 1 v) 
counted by the MMIX emulation mmix-sim '«|. 

The often-encountered tradeoff between L and T is 
taken into account by assuming that there is a cost func- 
tion depending on both resources, which increases strictly 
monotonically with L at fixed T and with T at fixed L 
but is otherwise unspecified. With this in mind, define 
the relations < {always cheaper or equal) and -< (always 
cheaper) for programs pi , P2 by 

Pi ^ P2 fe' L{pi) < L{p2) and T{pi) < T{p2) (5) 



and 



Pi -< P2 Pi ^ P2 and not p2 <Pi- 



(6) 



It turns out that the machine dependence of relations 
^ and ^ for implementations of algorithms on different 
processor models is weak. In principle, other resources 
could also be taken into account in definition Q such as, 
for tests, the number of experimental runs required. 

Since for every program p there is only a finite number 
of programs with smaller or equal length, there is also 
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only a finite number of programs p' such that p' < p or 
p' ^ P- Below we need Lemma 1: Every nonempty set 
P of tests or generators contains an element p which is 
minimal with respect to the relation -<, i.e., such that no 
p' € P satisfies p' -< p. This is a direct consequence of 
the previous note and the transitivity and antirefiexivity 
of In general, there are several minimal elements, each 
using its own mix of resources. This reflects the intuition 
that there are several "good" models and characteriza- 
tions for one experiment. 

These concepts from statistics and computer science 
are now combined to formalize requirement (O, except 
for the condition regarding artifacts. Denote by gx the 
sequence {Yi} of random outputs of generator g at con- 
trol parameter x. Define for given C, D the notion of 
an optimal generator g relative to a test t and a power 
threshold 1 > 7 > a by 

opt]g Tpow(tx, gx) < ct and (7a) 

xec[t] 

/\ y powitx,g'x)>l and (7b) 

g'^g xeC[t] 

f\ y pow{tx,g'x) >pow{tx,gx), (7c) 

9'd9 xec[t] 

where the quantifiers /\ {for all) and Y (there is) have 
been introduced for brevity. Line (|7a|l states that g sat- 
isfies t, line Ij7bl) says that all cheaper generators are re- 
jected by t with power > 7 and line (|7c|l handles the 
generators that use the same resources as g. The test t 
is specific to g in the sense that it does not apply to any 
9' 9- 

In order to disentangle the circularity between models 
and characterizations, consider now the problem of speci- 
fying a generator by characterizing its output. For a i.i.d. 
random sequence {Yi} denote by ^[{^1}] the distribution 
function of its elements, i.e., p[{i^i}](2/) := Pr[li = y] 
for y G D. Call a generator g an optimal implementation 
with respect to a set C C C iff there is no generator g' -< g 
such that p[g'x] = p[gx\ for all a; £ C. Theorem 1 : Given 
C and D, there is for every C C C, every optimal im- 
plementation g with respect to C , and every 1 > 7 > a, 
a test t such that optjg and C[t] — C . Outline of the 
proof: Explicitly construct t. x G C can be tested for by 
keeping a list of C in t. Since there is only a finite num- 
ber of g' :< g, the test must distinguish p[gx] from a finite 
number of different distributions plg'x] for all x £ C, with 
certainty 7 if g' ^ g. This can be achieved by comparing 
a sufficiently accurate representation of p[(7a;], stored in t 
for all X € C, with a histogram sampled from g'^. With a 
high number of samples, any degree of certainty can be 
reached. □ The cost of testing is not taken into account, 
yet. 

The following definition formalizes the criterion stated 
above for choosing models and characterizations; to find 



pairs (t, g) jointly satisfying conditions (|1I2|I as well as 
possible. Only the validity of characterizations for ex- 
periments is not contained in the definition: Given C 
and D, call a pair {t,g) a basic model specifying charac- 
terization (b.m.s.c.) iff there is a 1 > 7 > a such that 
opilg and there is no t' -< t with C\t] C C[t'] and opt^,;;. 

This optimization with respect to t implies the avoid- 
ance of artifacts, when artifacts are considered as proper- 
ties that are specific and are cheaper to communicate and 
verify than the property t that g is supposed to model. 
The definition of a b.m.s.c. involves the simultaneous 
minimization of cost with respect to t and g. An an- 
swer to the question if there are any nontrivial solutions 
to this double optimization problem - i.e., if the circular 
relation between models and characterizations as consid- 
ered here is vicious - is given by Theorem 2: Given C 
and D, there is, for every C C C and every optimal im- 
plementation g with respect to C , a test t such that (t, g) 
is a b.m.s.c. and C C C[t]. Proof: Fix some 1 > 7 > a. 
By Theorem 1, the set S := {t\opt]g and C C C[t]} is 
nonempty. Theorem 2 is satisfied by any t €z S which is 
minimal with respect to the half ordering ^. By Lemma 1 
such an element exists. □ 

Only for a few b.m.s.c. {t,g) the test t also applies 
to a given experiment. Generally, there will be some 
fundamental level of description (the Schrodinger equa- 
tion, say) at which a 1-to-l model g of the experiment 
can be constructed, and then a corresponding t exists by 
Theorem 2. But these b.m.s.c. are often too expensive. 
Finding cheaper b.m.s.c. that apply to the data requires 
intuition, insight, and experience, and goes beyond the 
scope of this work. The goal here was only to investigate 
if an objective, well-posed problem of modeling and char- 
acterizing exists, and to model it so that among several 
solutions conceived some are selected. 

As an example for an application, assume some ideal- 
ized experiment, the fundamental description of which is 
given by the Kuramoto-Sivashinsky (KS) equation 

drU ~ — 9|u — 9|m -I- ud^u, (8) 

with periodic boundary conditions u(t,^) — m(t, ^ -I- S), 
as they apply for experiments in a ring-channel geometry. 
In each experimental run, 128 equally spaced points of 
u at distance = 0.8 (S = 128 x A^) are sampled 
200 times in At = 0.2 sampling intervals, while u is 
evolving along the chaotic KS attractor. The data format 
D is given by all sequences of 128 x 200 = 25600 8-byte 
floating point numbers. There is no control parameter: 
X is always the empty string and the only element of C. 

The systematic construction of a pair (t, g) likely to be 
a b.m.s.c. of the experiment goes from the generator g 
over a corresponding test t to a veriflcation that the ex- 
periment passes the test t. Practically flnding a suitable 
g requires a preliminary approximation of t characteriz- 
ing the experiment. This flrst, exploratory step is not 
described here. 
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FIG. 2: Modeling and characterization of KS chaos, (a) Gray 
coding of the output y of model g that approximates Eq. ||HJ . 
(b) Regions where du{T, ^) /d£^ > (black) as used in the 
"tree-test" t. (c) A precise solution of Eq. JHJ for comparison. 

The code for the generator g is a minimal-length im- 
plementation of Eq. © on an MMIX processor. A dis- 
cretization Up g locally approximately proportional to a 
solution u(10p At, qAx) of Eq. JHJl is obtained by an Eu- 
ler integration with in-place update of the form 

+ {C3 + - Up+l^q^l)Up^q, (9) 

where (01,02,03) « (—0.05,0.18,0.75). Including code 
to handel the periodic boundaries, to initialize uo,{ with 
random numbers 0(10"^), to drop a transient of 16 time 
units, and to output y (Fig. |2^), this requires L{g) = 
260 bytes and T{g) = 34 Mv for a single run. 

For the test t, a code is used that implicitly computes 
the stripes 9^m(t, ^) > (Fig. [^b) using data of every 
fc-th sampling interval (fc w 5). Then it determines for 
each of = 20 runs the total numbers of beginnings ni,, 
ends ne, mergers Um, and splits of the stripes along 
the time axis, as well as the average number / of stripes. 
The value of N implicitly determines S. 

If a combination {rib, rie, rim, rig, I) is repeated for two 
runs, the test rejects the data stream in order to en- 
force randomness. The averages nf,,7Te, ris,! of these 
statistics over all A^ runs are determined. The data is 
rejected if Ue > me ~ 15.2 or > rris ~ 0.4, which 
enforces the tree-like geometry of the stripes and con- 
sequently a minimal accuracy of g. Data is rejected if 
(ni — mi)'^ > vi or {nj, — mf,)^ > Vb, which sets the length 
and time scales of the tree structure [{mi,vi,mb,Vb) ~ 
(14., 0.03, 22., 1.5)]. Finally, data is rejected if the dif- 
ference between the initial and final number of stripes is 
large, i.e., (jLe + rim — rib — risY > fa ~ 1-7, which enforces 
the suppression of a transient in the generator. Within 
the statistical error, t accepts g at the a = 0.1 significance 
level: Y>avf{tx,gx) = 0.105(3) < a. Using precise numer- 
ical simulations of Eq. , it was verified that solutions 
of the fundamental description (Fig. |2t) are rejected by 
t with a probability of only 0.03(1) < a. That is, t char- 
acterizes the "experimental" data and is even robust to 
small deviations from the fundamental description ((SJ. 
A compiler-optimized implementation 9] of t requires 
L{t) = 1192 bytes and T{t) = 3.8 Mu = A^ x 0.19 Mw. 

In principle, the precise values of the tuning parame- 
ters in t could be determined by locally solving the opti- 



mization problem for the condition for the pair {t, g) to 
be a b.m.s.c. to the precision of the coding of the param- 
eters. Regarding the question if this pair is also a global 
solution of the optimization problem for a b.m.s.c, it 
can only be said that this is a plausible conjecture. It 
has been checked that the direct verification of Eq. 
would yield a test that is shorter than the tree-test t, 
but requires much more time. Likewise, generators more 
explicitly coded to generate tree structures accepted by 
t could be faster than g, but the examples investigated 
indicate that, due to several conditional branches, they 
would always be longer. Thus, no counterexamples could 
be found. Notice that the information reduction per- 
formed by t in concentrating on the stripes d^^u > is 
not externally imposed. Rather, it is the a consequence 
of the rather small number of competing generators to 
be excluded. 

To the degree that the pair (t, g) described here it is a 
b.m.s.c, it is also of practical relevance. The tree-test t 
provides a fast, rather simple, and robust way to identify 
KS chaos. There seems to be no other simple "expla- 
nation" for the structure identified by t. On the other 
hand, g provides a simple and, as it turns out, compera- 
tively fast method to obtain approximations of KS chaos 
on digital computers, which is important whenever re- 
sources are scarce. 

A formal scheme combining computation and statistics 
for choosing models and characterizations has been laid 
out. It models the main aspects of the practical problem. 
The question if the choices are "intuitive" is presumably 
hard to answer systematically. At least, it has been ar- 
gued, they are useful: not because nature is a computer, 
but because people use computers. 
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